Kuaishou and Shanghai Jiao Tong University jointly launch the Orthus model, breaking new boundaries in multimodal generation
Kuaishou and Shanghai Jiao Tong University announced the open-source multimodal generation model Orthus at the ICML conference. The model is based on an autoregressive Transformer architecture and demonstrates excellent performance in text-image conversion tasks, with significantly improved computational efficiency compared to existing models. Orthus uses a unique architectural design, separating processing of text and image features through specific modal generation heads, and efficiently fusing them in a unified representation space. Research shows that the model outperforms professional models in image understanding and text-to-image tasks, while also demonstrating strong potential in application scenarios such as image editing, making it a breakthrough in multimodal generation.